Compiling large language resources using lexical similarity metrics for domain taxonomy learning
نویسندگان
چکیده
In this contribution we present a new methodology to compile large language resources for domain-specific taxonomy learning. We describe the necessary stages to deal with the rich morphology of an agglutinative language, i.e. Korean, and point out a second order machine learning algorithm to unveil term similarity from a given raw text corpus. The language resource compilation described is part of a fully automatic top-down approach to construct taxonomies, without involving the human efforts which are usually required.
منابع مشابه
The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملThe Relationship between Iranian Upper-Intermediate EFL Learners’ Contrastive Lexical Competence and Their Use of Vocabulary Learning Strategies
Regarding the vital role of lexical competence as an important requisite for the attainment of full mastery of the four language skills, this study tried to investigate the relationship between Iranian EFL learners’ contrastive lexical competence and their use of vocabulary learning strategies. To fulfil this objective, 60 Iranian upper-intermediate male and female language learners were select...
متن کاملData-driven Natural Language Generation: Paving the Road to Success
We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, mor...
متن کاملThe Study and Review of Paraphrase Detection Techniques in Machine Learning
ABSTARCT: Paraphrase is a process of computing the semantic similarity between sentences, which are not lexicographically similar. Though a number of metrics for English language have been proposed in literature, to quantify textual similarity; it addresses the problem for detection of monolingual text-text lexical similarity. Existing system for Indian Language paraphrase detection uses lexica...
متن کاملMeasuring Semantic Textual Similarity of Sentences Using Modified Information Content and Lexical Taxonomy
In this paper, we present a survey and comparative studies on semantic textual similarity methods, those are based on WordNet taxonomy. We also proposed a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between ...
متن کامل